Enhancing Privacy in Distributed Data Clustering

نویسندگان

  • LUONG THE DUNG
  • HO TU BAO
چکیده

The protocol of privacy-preserving clustering with distributed EM mixture modeling was proposed. However, it is not completely secure in the situation that something more than just the model parameters are revealed. Specially, when the dataset is horizontally partitioned into just two parts, this reveals extra information. The aim of this work is firstly to develop a more general protocol which allows the number of participating parties to be arbitrary and more secure. Secondly, we propose a better method for the case in which the dataset is horizontally partitioned into only two parts. This method allows computing covariance matrices and final results without revealing the private information and the clustering centers. Tóm tắt. Mô.t số giao thú.c da’m ba’o t́ınh riêng tu. trong bài toán phân cu.m dũ. liê.u du. . a trên thuâ. t toán EM dã du.o. . c dè̂ xuất trong cô.ng dồng nghiên cú.u. Tuy nhiên, các giao thú.c này không hoàn toàn da’m ba’o t́ınh riêng tu., v̀ı trong mô.t số t̀ınh huống mô.t vài tham số cu’a mô h̀ınh có thê’ bi. lô. , dă.c biê.t khi tâ.p dũ. liê.u chı’ du.o. . c phân thành hai phà̂n theo chiè̂u ngang. Bài báo này gió.i thiê.u hai dóng góp ch́ınh trong mô.t phu.o.ng pháp mó.i gia’ i bài toán trên. Mô.t là giao thú.c mó.i cho phép mô.t số lu.o. . ng tùy ý các thành viên tham gia vào viê.c phân cu.m dũ. liê.u và da’m ba’o tốt ho.n t́ınh riêng tu. cho dũ. liê.u cu’a các thành viên. Hai là lò.i gia’ i tốt ho.n trong tru.̀o.ng ho. . p tâ.p dũ. liê.u chı’ du.o. . c phân thành hai phà̂n theo chiè̂u ngang. Phu.o.ng pháp này cho phép t́ınh toán các ma trâ.n hiê.p phu.o.ng sai và các kết qua’ cuối vó.i viê.c da’m ba’o không làm lô. các thông tin riêng tu. cũng nhu. dối tu.o. . ng trung tâm cu’a mỗi cu.m dũ. liê.u.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Comprehensive Research on Privacy Preserving Emphasizing on Distributed Clustering

Often, the information is sensitive or private in nature and these sensitive data when mined violates the privacy of the individuals. Privacy preserving data mining (PPDM) mines the data but intends to preserve the privacy of susceptible data without ever actually seeing it. This paper recaps the important techniques in PPDM like anonymization, perturbation and cryptography. Nowadays, data mini...

متن کامل

A High Performance Privacy Preserving Clustering Approach in Distributed Networks

Privacy preserving over data mining in distributed networks is still an important research issue in the field of Knowledge and data engineering or community based clustering approaches, privacy is an important factor while datasets or data integrates from different data holders or players for mining. Secure mining of data is required in open network. In this paper we are proposing an efficient ...

متن کامل

Privacy-Awareness of Distributed Data Clustering Algorithms Revisited

Several privacy measures have been proposed in the privacypreserving data mining literature. However, privacy measures either assume centralized data source or that no insider is going to try to infer some information. This paper presents distributed privacy measures that take into account collusion attacks and point level breaches for distributed data clustering. An analysis of representative ...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

Sequential Clustering Algorithms for Anonymizing Social Networks

The privacy-preservation in social networks is major problem in now-a-days. In distributed setting the complex data is divide between several data holders. The target is to appear at an anonymized view of the unified network without illuminating to any of the data holders information about links between nodes that are hold by other data holders. To that finish, in centralized setting two varian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010